Automated sleep stage classification based on tracheal body sound and actigraphy

The current gold standard for assessment of most sleep disorders is the in-laboratory polysomnography (PSG). This approach produces high costs and inconveniences for the patients. An accessible and simple preliminary screening method to diagnose the most common sleep disorders and to decide whether a PSG is necessary or not is therefore desirable. A minimalistic type-4 monitoring system which utilized tracheal body sound and actigraphy to accurately diagnose the obstructive sleep apnea syndrome was previously developed. To further improve the diagnostic ability of said system, this study aims to examine if it is possible to perform automated sleep staging utilizing body sound to extract cardiorespiratory features and actigraphy to extract movement features. A linear discriminant classifier based on those features was used for automated sleep staging using the type-4 sleep monitor. For validation 53 subjects underwent a full-night screening at Ulm University Hospital using the developed sleep monitor in addition to polysomnography. To assess sleep stages from PSG, a trained technician manually evaluated EEG, EOG, and EMG recordings. The classifier reached 86.9% accuracy and a Kappa of 0.69 for sleep/wake classification, 76.3% accuracy and a Kappa of 0.42 for Wake/REM/NREM classification, and 56.5% accuracy and a Kappa of 0.36 for Wake/REM/light sleep/deep sleep classification. For the calculation of sleep efficiency (SE), a coefficient of determination r2 of 0.78 is reached. Additionally, subjects were classified into groups of SEs (SE≥40%, SE≥60% and SE≥80%). A Cohen’s Kappa >0.61 was reached for all groups, which is considered as substantial agreement. The presented method provides satisfactory performance in sleep/wake and wake/REM/NREM sleep staging while maintaining a simple setup and offering high comfort. This minimalistic approach may address the need for a simple yet reliable preliminary sleep screening in an ambulatory setting.


Introduction
The number of individuals suffering from sleep disorders worldwide continually increases at a drastic rate. However, public awareness of the importance of sleep quality and the implications of sleep disorders is low [1]. A good example for this lack of awareness is obstructive sleep apnea (OSA), the most common cardio-respiratory sleep disorder. Here, 10% of 30-to 49-year-old men, 17% of 50-to 70-year-old men, 3% of 30-to 49-year-old women, and 9% of 50-to 70-year-old women suffer from moderate to severe OSA [2]. OSA can lead to cardiovascular diseases and extensive daytime sleepiness. The resulting cognitive impairment often comes with personal and societal consequences, such as driving and workplace accidents [3]. The gold reference for the diagnosis of sleep disorders is in-laboratory polysomnography (PSG). Its labor intensive, expensive, and time-consuming nature, paired with the increasing prevalence of sleep disorders, has led to a strong demand for appropriate hospital facilities. Therefore, sleep centers worldwide are typically operating at full capacity and waiting times are long, causing economic losses due to prolonged invalidity. Patients are often reluctant to carry out a PSG since an overnight stay in an unfamiliar sleep laboratory is required. Furthermore, during the first night the extensive recording often worsens their already bad sleep. A possible solution for these issues would be an accessible and simple preliminary screening method for the most common sleep disorders. Based on the results of this initial test, further diagnostic measures like the PSG could be considered. Several less extensive sleep diagnostic methods have been developed. The most simple methods use 1-2 recording channels and are referred to as type-4 sleep studies. They benefit from a low price, simple setup, and can often be used in a home setting without medical assistance. In the context of OSA diagnosis, the developed systems either use nasal airflow and/or S p O 2 [4], [5]. However, these measurement channels induce several problems and limitations. Mouth-breathing or misplacement frequently lead to signal loss. Additionally, those systems cannot perform any sleep staging, and systematic reviews reported poor diagnostic performance for OSA [6], [7]. Sleep stages are of great interest for sleep screening since they are used to evaluate total sleep time, measure the overall level of sleep quality, and detect sleep disruptions. Sleep stages can also be used to diagnose other sleep disorders such as insomnia and circadian rhythm disorders [8]. Human sleep can be classified into the stages wake (W), rapid-eye movement (REM), and three non-REM stages (N1, N2, N3) [9]. N1 and N2 are often grouped into so-called "light sleep", and N3 is often referred to as "deep sleep" [10]. The conventional method for sleep staging is the manual evaluation of the electroencephalogram (EEG) recording carried out during PSG. The EEG comes with several technical challenges and is mostly not fit for use at home or without medical at-tendance. For ambulatory sleep staging, several automated methods have been developed. These mainly focus on evaluating the variation in heart rate and breathing rate as well as movements. This so-called cardiorespiratory sleep stage classification has extensively been studied in recent years [11], [12], [13] and provides promising results. Here, cardiac features are extracted using electrocardiography (ECG), and respiratory features are extracted using respiratory inductance plethysmography (RIP). Furthermore, studies solely relying on respiratory features to assess sleep stages also showed good correlation (>70%) with sleep stage classification [14], [15], [16]. However, these methods also come with a complex setup and cannot be used without assistance or in a home setting. Thus, a method which preserves the simplicity of a type-4 sleep study while performing reliable OSA diagnosis and sleep staging for a preliminary screening is desirable. A novel type-4 monitoring system has previously been developed by Kalkbrenner et al. [17], [18]. This monitor utilizes tracheal body sound and actigraphy to screen for OSA [19], [20]. This allows simple setup and high comfort, minimizing the effect on sleep quality while outperforming similar existing ambulatory diagnostic. The assessment of sleep stages could improve the diagnostic ability for a preliminary screening even further, but was not part of previous research. Therefore, this study aims to examine if it is possible to use the presented type-4 monitor to perform automated sleep staging utilizing body sound to extract cardiorespiratory features, and utilizing an inertial measurement unit (IMU) to extract movement features. This minimalistic approach may address the need for simple yet reliable preliminary screening including sleep staging and OSA diagnosis in an ambulatory setting.

Method
Subjects 60 adult subjects were included in the present study. All subjects were referred to the sleep center at Ulm University Hospital with suspicion of OSA. During their overnight stay, all subjects underwent full-night diagnostic PSG screening. Simultaneously a recording using the new monitoring system was carried out. Recordings only include so-called diagnostic nights without the presence of any therapeutic measures. Data recorded here was also used in another study to validate the ability of the new monitoring device to screen for OSA [19]. The study was approved by the ethics committee of Ulm University, and all subjects gave written informed consent. In total, seven recordings were excluded due to faulty body sound (n=1) and faulty ECG (n=6) recordings. These faulty recordings included subjects suffering from central sleep apnea or mixed forms, and subjects suffering from Cheyne-Stokes respiration. There were no data sets with faulty submental channel recordings. The remaining 53 data sets were considered for sleep staging. Since all

Data acquisition
Trained medical staff set up the PSG and the new monitoring system. Recordings were monitored during the night. The recording for the diagnosis started between 9 pm and 11 pm and ended between 5 am and 7 am. PSG was carried out using the SOMNOlab PSG system (Co. Weinmann Geräte für Medizin GmbH + Co. KG, Kronsaalsweg 40, 22525 Hamburg, Germany). EEG included channels C3-A2 and C4-A1 with a sampling rate of 256 Hz. Furthermore, submental EMG, unilateral anterior tibial EMG, and bilateral EOG were included and sampled with 256 Hz. The PSG system also included video recording during the night. To assess sleep stages, a trained technician manually evaluated EEG, EOG, and EMG recordings according to the AASM criteria [21]. Each 30-second epoch is assigned to one sleep stage (WAKE, REM, N1, N2, or N3). The oronasal airflow was recorded by using a thermistor and was sampled with 32 Hz. Additionally, thoracic and abdominal respiratory movements were measured using respiratory inductance plethysmography, sampled with 32 Hz. Oxygen saturation was recorded by using finger pulse oximetry, sampled with 16 Hz. The new monitoring system has previously been described in [17], [18]. This previous research presents the developed monitoring system as a new, reliable, and simplified ambulant sleep monitor, only utilizing tracheal body sound and movement data to automatically diagnose OSA. The main criteria to indicate the severity of OSA is the apnea-hypopnea index (AHI), which the proposed sleep monitor estimates precisely to reliably diagnose sleep apnea and its severity. Figure 1 shows an abstract illustration of the setup of this monitor. A commercially available body sound microphone was used to record body sound. It was attached to the subject's neck and sampled with 5 kHz. The microphone was designed for the long-term monitoring of lung sounds to diagnose breathing disorders like asthma and is part of a system called LEOSound (Co. Heinen+Löwenstein GmbH & Co. KG Arzbacher Straße 80, 56130 Bad Ems, Germany). In addition, an inertial measurement unit (IMU) was implemented as actigraph to record the movements of the subject. The IMU measures acceleration and gravitational forces using a combination of accelerometers and gyroscopes sampled with 250 Hz. The IMU was attached to the existing thoracic belt of the respiratory inductance plethysmograph of the PSG with a defined orientation. This defined orientation is necessary to evaluate sleep position. For subsequent data analysis, all data were transmitted wirelessly to a laptop.

Feature extraction
A vast amount of signals and their characteristics can be used to classify sleep stages [22]. Using the new monitoring system, breathing cycles, heart beats and movements can be extracted. Various basic research revealed that the dynamics of heart rate [23], [24], [25], [26] and the dynamics of respiration [27], [28] [29]. Additionally, it is suggested that subject movements may also relate to sleep stages and are therefore included into the feature selection. Each feature and their correspondent source is listed in Table 2. The methods to extract the cardiorespiratory features from the tracheal audio signal are described in the following.

Respiratory
The tracheal body sound is utilized to extract the respiratory features. Figure 2 illustrates the key steps of the developed method. The initial raw audio signal consists of breathing, heart sounds, background noise and movement artefacts. To obtain a pure breathing sound signal, a FIR bandpass filter with boundaries between 200 and 2000 Hz is used. To reduce background noise a noise template is subtracted from the signal in the frequency domain using spectral subtraction [30]. Finally, the signal is divided into short-term windows and the envelope curve E is calculated by where N is the number of samples in the window and x i the i-th sample. This curve can now easily be used to detect breathing cycles and estimate airflow. A broader discussion of the challenges of relating the amount of airflow to tracheal breathing sounds can be found in [20], [31], [32]. With this information, the respiratory features presented in Table 2 can be calculated. Here, most features relate to the variation of the time interval between successive breaths (BB interval).

Cardiac
The tracheal body sound signal is also utilized to detect heart beats in order to calculate the appropriate cardiac features. To suppress breathing and most of the artefacts from the initial raw audio signal, a bandpass filter with the boundaries between 5 and 30 Hz is applied. An exemplary raw signal is shown in Figure 3A with the correspondent filtered signal shown in Figure 3B. The filtered signal mostly consists of pairs of distinct peaks generated by physiological heart beats. By searching the minimal distance from each peak to its adjacent peaks, we can group two peaks into one correspondent heartbeat. The presented method enables heart beats to be detected even during snoring. Nevertheless, bad microphone coupling, movements or similar artefacts can cause the heart beat detection to fail. During those periods heart beats are interpolated based on preceding values. Finally, all cardiac features presented in Table 2 are calculated. The extraction of cardiac features mostly focuses on heart rate variability (HRV) analysis, based on evaluation of successive heart beat intervals (NN interval).

Movements
The IMU is utilized to extract features considering sleeping position and movements. Methods published by Madgwick et al. [33] are used to process the data of accelerometers and gyroscopes in order to track the orientation of the IMU. Since the IMU is placed at the thoracic belt, the sensor orientation yields information about the movements and the position of the subject. Details of the utilized processing techniques were previously described by Kalkbrenner et al. [18]. Using this information, it is possible to determine the most prominent sleeping position of the subject during one epoch. Additionally, the changes of sleeping position per epoch are counted. To assess the general movement activity of the subject the mean acceleration and angular velocity over all degrees of freedom are calculated.

Validation
Since the recorded data is limited, a leave on-out cross validation was carried out to validate the LD classifier instead of simply splitting our data into test and training sets. 53 data sets were created each consisting of 52 (of 53) subjects to train the classifier leaving the remaining subject for testing. For every data set, the subject for testing was changed until every subject has been used for testing once. For final evaluation, the classification accuracy for each sleep stage was computed by averaging over the results from all data sets. The classification accuracy is the percentage of the respective sleep stage correctly classified. Additionally the un-weighted Cohen kappa coefficient [36] was calculated. This coefficient is more applicable to evaluate classifiers for unevenly distributed data like sleep stages. The validation was carried out for the 2-stage, 3-stage and 4-stage system. One of the essential parts of sleep monitoring, especially in an ambulatory setting, is the measurement of sleep efficiency (SE). The SE is the ratio of sleep time to total time in bed. In this study, the SE is calculated using the results of the 2-stage system classifier (SE est ) and compared to PSG SE (SE PSG ) using correlation analysis. Additionally, the wake after onset time (WASO), total sleep time (TST) and total wake time (TWT) are also calculated and compared to the gold standard. For expanded use without medical supervision at home, an easy to use indicator for scoring SE is desirable. Therefore, thresholds of SE 0%-39%, SE≥40%-59%, SE≥60%-79% and SE≥80% were defined and subjects were classified into the according groups based on the calculated SE of the 2-stage system classifier. The sensitivity, specificity, positive predictive value, negative predictive value, and the un-weighted Cohen kappa coefficient [36] of these classifications were calculated. Receiver operating characteristic (ROC) curves and the according areas under the curves (AUCs) were calculated to evaluate the performance against the PSG results. Table 3 shows detailed results of the classifier performance of all three staging systems. For each subject the 2-stage classifier result was used to calculate SE est . These results were compared to the SE PSG . Figure 4 shows the corresponding correlation plot. The coefficient of determination r 2 is 0.78. A standard t-test of the paired differences revealed p=0.008, 95% CI=[0.94 5.98], SD=9.14. Furthermore, Figure 5 shows the correlation of the parameters WASO, TWT and TST. A detailed performance evaluation of the subject classification into groups of SE≥40%, SE≥60%, and SE≥80% is shown in Table 4. A Cohen's Kappa >0.61 was reached for all groups, which is considered as substantial agreement [19]. The according ROC curves were created and are shown in Figure 6.

Discussion
The application of a new type-4 sleep monitor based on tracheal body sound and movement data for automated sleep staging was demonstrated and its performance was validated by comparison against standard PSG sleep staging. The system is designed to allow a simple setup and high comfort, minimizing its impact on sleep quality. It facilitates ambulatory use with no need of medical su-  It is important to note that current research suggests a general performance limitation of sleep staging based on cardiorespiratory signals caused by subject variability [37]. Additionally, the sleep staging of the PSG was done manually and is therefore open to the subjectivity of the scoring technician. However, since the evaluation of the sleep stages was carried out within the daily business of the sleep laboratory, we were not able to evaluate the intrarater reliability. Future studies must include this evaluation in order to eliminate the uncertainty of the scoring technician. As shown in Table 4, the new sleep monitor provided a substantial agreement with the PSG results in classifying subjects into groups of SE. Table 5 shows the results of similar approaches of sleep staging using cardiorespiratory features. It is important to note   that those previously proposed methods utilize well established approaches to record cardiorespiratory signals (e.g. ECG, inductance plethysmography). This comparison reveals that the 2-and 3-stage system achieved acceptable results. The 4-stage system gets clearly outperformed by the best results found in literature. Nevertheless, the novel method presented in this paper only uses a type-4 monitor using tracheal body sound recorded with a single lead and movement data to extract all features presented in Table 2. Furthermore, the most essential parameters for sleep staging like SE can be calculated using the 2stage system. It is suggested that the 2-or 3-stage system suffice for a preliminary screening. Some researchers also present methods for sleep staging utilizing unobtrusive and comfortable methods. Samy et al. present a high-resolution pressure-sensitive bed sheet to extract sleep-related biophysical and geometric features for sleep staging [38]. An overall accuracy of 71.1% was reached for a sleep 3-stage system while including seven subjects in their study. Sensor foils placed into the bed mattress are used by Kortelainen et al. [39] to extract relevant features and parameters for sleep staging. An overall accuracy of 79% and a Kappa of 0.44 was reached for a sleep 3-stage system while including 18 subjects in their study. These results are similar to those presented in this paper. While offering non-contact and unobtrusive sleep staging, the main drawback of those methods are the excessive noise problems during body movements preventing reliable sleep staging. Additionally, the significance of these studies may be limited by the small number of participants. The present study holds several advantages and limitations. The setup of the PSG and developed monitor was performed by previously trained medical staff. EEG derivatives do not comply with the current AASM standard. However, it can be assumed that the use of several EEG derivatives only leads to minor changes in the distribution of the derived sleep stages and no significant differences in scoring reliability [40]. The PSG results are covering the entire spectrum from healthy subjects to subjects suffering from severe OSA. Additionally, the sex, age, and BMI distribution cover a wide range of different individuals. All subjects included in the present study were recruited with a suspicion of OSA. The sleep of subjects suffering from OSA is disrupted by arousals caused by breathing pauses. Therefore, at least 42 patients included in the present study do not represent a healthy sleep. This may suggest that the presented results are not applicable to the general population. However, 11 subjects included in the presented study did not suffer from OSA.
A relationship between OSA and sleep staging performance could not be observed. This finding might indicate that the presented method performs as well for healthy subjects as for subjects suffering from OSA. Redmond et al. [11], [12] also suggest that OSA is no limitation for using cardiorespiratory signals for sleep staging. However, further studies are required to fully validate this statement.
To further improve automated sleep staging, subjectspecific classifiers or subject-specific feature normalization are utilized in various research [12], [15]. In general, a subject-independent classifier can be set up without calibration or any adjustment, again facilitating the use in the homecare area without medical supervision. Nevertheless, subject-specific feature normalization or classifier training can be useful in multi-night studies. Further research should be undertaken to investigate the potential of the classifier in subject-specific classification. Cardiorespiratory features used for classification in this work are solely based on existing research. Tracheal body sound might offer additional features related to sleep phases currently not utilized. It can be assumed that it is possible to extract snoring, wheezing, and similar breath related sounds. Furthermore, since the new sleep monitor is placed at the thoracic belt of the subject, the movements of the chest due to breathing are also reflected within the IMU data. Therefore, the IMU could be utilized to calculate features relating to thoracic or abdominal respiratory movements. In comparison to PSG or ECG, there is no need to apply a vast number of additional sensors or electrodes during the night to conduct simple sleep staging and evaluation of SE. Additionally, it is suggested that using less sensors leads to a better sleep quality and therefore to more reliable results. A previous interview of untrained volunteers regarding ergonomics and user-friendliness of the monitor showed a positive result [17]. The SE classification can be used to create a simple traffic light system (e.g. green meaning "everything is fine", and red meaning "see a doctor"), understandable without medical knowledge. It can therefore be assumed that the new sleep monitor can be used for simple sleep monitoring and preliminary screening at home. Additionally, previous research proved that the new sleep monitor is also able to diagnose one of the most common sleep disorders OSA [19], [20]. Using sleep staging, it is suggested that the monitor can now also be used to diagnose other sleep disorders, such as insomnia, or for a preliminary screening to decide whether a PSG is necessary or not. However, further studies need to be carried out to validate the ease of use and the reliability of the new monitoring system in an unattended setting. In summary, the presented method provides high performance for 2-and 3-stage sleep staging while still maintaining a simple setup and a high comfort for the patient.

Notes Competing interests
The authors declare that they have no competing interests.